Generation of Adaptive Vocabulary Lexicon for Japanese LVCSR
نویسنده
چکیده
One of the thorniest problems of large vocabulary continuous speech recognition systems is the large number of out-of-vocabulary (OOV) words. This is especially the case for the languages like Japanese, which has many inflections, compound words and loanwords. The OOV words vary with the application domains. It's not realistic to have a big general-purpose lexicon including any possible 00V words. Furthermore, embedded speech recognition systems become more and more popular recently. They strongly demand an economical and effective exploitation of lexicon space. In this paper, we introduce a lexicon development model dealing with different kinds of OOV words with the help of linguistic morphological knowledge. It provides unsupervised, fast vocabulary adaptation among different application domains. In the experiments that adapt a lexicon of typical vocabulary, we were able to reduce the OOV rate by 40% and improve the word segmentation error rate by 27%. And the smaller the lexicon is, the more we benefit from the vocabulary adaptation.
منابع مشابه
A hybrid language model for open-vocabulary Thai LVCSR
This paper investigates the use of a hybrid language model for open-vocabulary Thai LVCSR. Thai text is written without word boundary markers and the definition of word unit is often ambiguous due to the presence of compound words. Hence, to build open-vocabulary LVCSR, a very large lexicon is required to also handle word unit ambiguity. Pseudomorpheme (PM), a syllable-like sub-word unit specif...
متن کاملLanguage independent and language adaptive large vocabulary speech recognition
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...
متن کاملSpeech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon
This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the rst part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the issues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search...
متن کاملSpeech Input Acoustic Analysis Phoneme Inventory Pronunciation Lexicon Language Model
This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the rst part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the issues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search...
متن کاملExperiments towards a Multi-language LVCSR Interface
This paper describes experiments towards a multilanguage human-computer speech interface. Our interface is designed for large vocabulary continuous speech input. For this purpose a multilingual dictation database has been collected under GlobalPhone, which is a project at the Interactive Systems Labs. This project investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, ...
متن کامل